Skip to content

feat: streaming SSR pipeline#294

Merged
mohamedmansour merged 3 commits into
mainfrom
feat/bench-infra
May 16, 2026
Merged

feat: streaming SSR pipeline#294
mohamedmansour merged 3 commits into
mainfrom
feat/bench-infra

Conversation

@mohamedmansour
Copy link
Copy Markdown
Contributor

@mohamedmansour mohamedmansour commented May 15, 2026

SSR pipeline with bounded-channel streaming, lock-free chunk pool, structural per-render HTML injection, and a zero-allocation hot path. Replaces the legacy buffer-then-byte-scan-and-concat pipeline.

Headline numbers

Per-render, contact-book @ 1000 contacts, allocator-exact + getrusage. Origin/main string+postinject → this PR streaming+inject POOLED:

metric origin/main this PR Δ
allocations 526 520 −1.1 %
bytes/render 75.0 KiB 30.3 KiB −59.6 %
user CPU µs 33.6 22.5 −33 %

Browser-perceived metrics (Playwright, real Chromium, 250 ms render):

metric buffered streaming Δ
TTFB 265 ms 0.4 ms 663× faster
FCP / LCP 284 ms 56 ms 5.1× faster

Three commits

# Commit What
1 86ca1cbd Bench harness (origin/main-compatible: string, string+postinject). Cherry-pickable onto origin/main for a baseline.
2 7767be48 StreamingWriter + ChunkPool primitive + 3 new bench layers (criterion writer-paths, e2e-ttfb actix, Playwright browser).
3 6e3af609 RenderOptions::with_head_inject / with_body_inject, dedup DoS guard, 6 hot-path allocation cuts, CLI / commerce wiring.

What's in each commit

Streaming primitive (commit 2) — bounded tokio mpsc (default 4 chunks ≈ 16 KiB → backpressure), configurable with_flush_timeout (slow-loris DoS bound), lock-free ChunkPool via crossbeam_queue::ArrayQueue for zero per-flush allocation in steady state, typed ClientDisconnected / StreamTimeout errors.

Structural HTML injection (commit 3)with_head_inject / with_body_inject emit at parser-synthesized head_end / body_end structural boundaries. No byte scanner — cannot mis-fire on </head> / </body> literals in HTML comments, <iframe srcdoc>, or inline <script>. Dedupe guard prevents inject amplification (a 1 MiB inject × N duplicate signals would otherwise produce N MiB of output).

Hot-path perf (commit 3)request_path / entry_id / nonce / route_base switched from String to &'a str / Cow<'a, str> (4 fewer allocations per render); <for> loop variable name inserted once + get_mut-swapped per iteration (2·(N−1) allocations saved on N-iter loops); per-render component_index_cache eliminates the second build_component_index walk at body_end.

Security guards (commit 3) — empty-string normalization at handler init (defends against RenderOptions { nonce: Some(""), .. } field-bypass which would emit <script nonce="">, a hard CSP failure); explicit XSS warning on inject builder doc comments; encode_safe re-exported from webui_handler for callers that need pre-escaping.

Tests

299 total (15 streaming + 284 handler), including: 16-thread concurrent-render stress, 1 MiB inject roundtrip, cross-thread Bytes drop, marker-spoof robustness (<!-- </body> --> literals), dedupe guard, field-bypass CSP, nested <for> reusing same variable name, runtime-free flush timeout positive test, end() surfaces first-flush error. cargo xtask check ✅ green (1344 s).

Reproduce

# Resource bench: per-render allocations / bytes / CPU
git checkout 86ca1cbd && cargo xtask bench streaming-resource --save-baseline c1
git checkout 7767be48 && cargo xtask bench streaming-resource --baseline c1 --save-baseline c2
git checkout 6e3af609 && cargo xtask bench streaming-resource --baseline c2

# Browser metrics in real Chromium
cd examples/integration/streaming-browser-bench && pnpm test

Docs

  • BENCHMARKS.md — bench layer reference + before/after workflow
  • DESIGN.md "Streaming Response Writers" — primitive + injection API + safety contracts
  • crates/webui/benches/README.md, examples/integration/streaming-browser-bench/README.md

Production wiring

  • crates/webui-cli/src/commands/serve.rs — dev server uses StreamingWriter::new_pooled (256-slot pool ≈ 1.25 MiB peak), 30 s flush deadline, livereload script as Arc<str> via with_body_inject.
  • examples/app/commerce/server — same pattern; per-page image preload <link> via with_head_inject.

@mohamedmansour mohamedmansour requested a review from akroshg May 15, 2026 20:30
@mohamedmansour mohamedmansour changed the title bench(streaming): add full SSR benchmark suite + StreamingWriter primitive perf(streaming): signal-based HTML injection — TTFB 445× faster, −60% bytes/render, −29% CPU May 15, 2026
Comment thread crates/webui-handler/src/lib.rs Dismissed
@mohamedmansour mohamedmansour changed the title perf(streaming): signal-based HTML injection — TTFB 445× faster, −60% bytes/render, −29% CPU perf(streaming): signal-based HTML injection May 15, 2026
mohamedmansour and others added 2 commits May 15, 2026 14:13
Adds the benchmark infrastructure used to measure WebUI SSR performance,
implementation-neutral. This commit can be cherry-picked onto origin/main
to capture a baseline; subsequent commits in this branch then add the
streaming primitive (commit 2) and the signal-based injection +
hot-path perf hardening (commit 3), each with deltas measurable against
the numbers captured at this commit.

What this commit adds:

- crates/webui/benches/streaming_bench.rs (criterion native): writer-
  path wall-clock at three contact-book scales (10/100/1000) for two
  paths that exist on origin/main:
    * `string`            - pre-allocated String buffer baseline.
    * `string+postinject` - String + case-insensitive </body> byte-
      window scan + concat. Mirrors the legacy dev-mode livereload
      pipeline (`lr.inject(&buf)`).

- crates/webui/examples/streaming_resource_bench.rs (custom
  GlobalAlloc + getrusage): per-render allocation count, total bytes,
  user CPU microseconds, peak RSS for the same two paths.
  Snapshot save/load via --save NAME / --compare NAME.

- xtask/src/main.rs:
    * `cargo xtask bench streaming` runs the criterion writer-path
      bench. `cargo xtask bench streaming-resource` runs the custom
      allocator bench. `cargo xtask bench full` runs both.
    * --save-baseline NAME / --baseline NAME flags map to criterion's
      native flags for the criterion bench, and to --save/--compare
      for the resource bench. Both store JSON/criterion snapshots
      under target/bench-baselines/ (or target/criterion/).

- BENCHMARKS.md: top-level documentation describing the bench layers,
  the threshold guidance for noise vs signal, and the before/after
  workflow.

- crates/webui-parser/Cargo.toml: cargo-shear metadata exempting
  `clap` (used only via cfg_attr-gated derive macro that cargo-shear
  cannot expand).

Subsequent commits will:

- Add the StreamingWriter / ChunkPool primitive plus the
  `streaming` / `streaming POOLED` rows to both benches, the actix-
  based streaming-e2e-ttfb bench, and the Playwright streaming-browser
  bench (commit 2).

- Add the signal-based RenderOptions::with_head_inject /
  with_body_inject API plus the `streaming+inject(opts)` / `streaming+
  inject(opts) POOLED` rows, the per-render hot-path perf hardening,
  and CLI / commerce wiring (commit 3).

Reproduction workflow:

  # On any commit:
  cargo xtask bench streaming-resource --save-baseline before
  cargo xtask bench streaming         --save-baseline before
  # Apply the change you want to measure...
  cargo xtask bench streaming-resource --baseline before
  cargo xtask bench streaming         --baseline before

Numbers from this commit on the contact-book-manager protocol at
scale 1000 (release build, 2000 iters/path):

  string/1000:            525 allocs, 51.7 KiB, 23.49 us user CPU
  string+postinject/1000: 526 allocs, 75.0 KiB, 33.65 us user CPU

The post-inject overhead at this commit (+9 us, +23 KiB output) is
the cost any host pays for per-request HTML splicing without a
structured injection API - the cost the implementation commit
eliminates.

Quality: cargo xtask check passes (1096s, all phases).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…layers

Adds the streaming SSR primitive (StreamingWriter, ChunkPool) and
extends the bench infrastructure from the previous commit with three
new measurement layers. No handler-level rendering semantics change at
this commit — the signal-based injection API and per-render hot-path
perf hardening land in the next commit.

What this commit adds:

- crates/webui/src/streaming.rs (~820 lines):
    * StreamingWriter: bounded tokio mpsc-backed ResponseWriter with
      coalesced ~4 KB chunks, configurable flush deadline (slow-loris
      DoS bound), typed disconnect/timeout errors. Documented usage
      pattern is `actix_web::rt::task::spawn_blocking`.
    * ChunkPool: lock-free shared pool of Vec<u8> chunk buffers
      backed by crossbeam_queue::ArrayQueue. Buffers recycle via
      Bytes::from_owner + a custom owner type that returns the Vec
      on Bytes drop. Cross-thread drop safety verified by test.
    * 13 unit tests covering coalescing, disconnect, timeout, chunk-
      size override, pool round-trip, dirty-buffer handling, capacity
      enforcement, single-Bytes drop, ref-counted clone drop,
      recycling across renders, cross-thread drop.

- crates/webui-handler/src/lib.rs:
    * HandlerError gains two variants (ClientDisconnected,
      StreamTimeout) so streaming writers can return typed errors.
      Both variants are payload-free (allocation-free) so error paths
      stay cheap.

- crates/webui/Cargo.toml + workspace Cargo.toml: adds tokio, bytes,
  crossbeam-queue, memchr, tokio-stream, actix-web, awc, futures-util
  to the deps needed by the streaming primitive and the new benches.

- crates/webui/benches/streaming_bench.rs: extended with a
  `streaming` row (alongside the existing `string` and
  `string+postinject` rows from the previous commit) plus a `ttfb`
  group measuring time-to-first-chunk for streaming vs buffered.

- crates/webui/examples/streaming_resource_bench.rs: extended with
  `streaming` and `streaming POOLED` rows for the same allocator-
  level + getrusage measurements as the baseline rows.

- crates/webui/examples/streaming_e2e_ttfb_bench.rs (NEW): in-process
  actix-web server measuring real HTTP TTFB / TTLB for `/buf` vs
  `/stream` under configurable per-write delays. JSON snapshot
  baseline support (--save NAME / --compare NAME).

- examples/integration/streaming-browser-bench/ (NEW): standalone
  Playwright suite + small hand-built actix-web server. Measures
  browser-perceived metrics (TTFB / FCP / LCP / DCL / load) in real
  Chromium across four render scenarios (no-delay, 25 ms, 100 ms,
  250 ms render times). The server is intentionally hand-built so
  it isolates the streaming-vs-buffered question without confounding
  from WebUI handler details. Baseline support via WEBUI_BENCH_SAVE
  / WEBUI_BENCH_COMPARE env vars.

- xtask/src/main.rs:
    * `cargo xtask bench streaming-e2e-ttfb` and
      `cargo xtask bench streaming-browser` targets added.
    * `cargo xtask bench full` (= `streaming-all`) now runs the
      criterion writer-paths + resource bench + e2e-ttfb + browser
      bench in sequence, threading the same baseline name through
      every layer.
    * --save-baseline / --baseline flags map to criterion's native
      flags for criterion benches, --save / --compare for the
      example benches, and WEBUI_BENCH_SAVE / WEBUI_BENCH_COMPARE
      env vars for the Playwright bench.

- xtask/src/e2e.rs: wires the streaming-browser-bench Playwright
  suite into `cargo xtask e2e` so it runs in CI alongside the
  other example apps.

- BENCHMARKS.md / crates/webui/benches/README.md: updated to
  describe the new bench layers and what each one measures.

Reproduction workflow:

  # On the previous commit (baseline-only):
  cargo xtask bench full --save-baseline before

  # On this commit (adds streaming):
  cargo xtask bench full --baseline before

  # Browser-perceived metrics (real Chromium):
  cargo xtask bench streaming-browser --save-baseline before
  # …on a later commit…
  cargo xtask bench streaming-browser --baseline before

Quality: cargo xtask check passes (1165s, all phases). All 13
streaming module tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mohamedmansour mohamedmansour changed the title perf(streaming): signal-based HTML injection perf(streaming): signal-based HTML injection — TTFB 663× faster, FCP/LCP −80%, −60% bytes/render May 15, 2026
@mohamedmansour mohamedmansour requested a review from mcritzjam May 15, 2026 22:19
@mohamedmansour mohamedmansour changed the title perf(streaming): signal-based HTML injection — TTFB 663× faster, FCP/LCP −80%, −60% bytes/render perf(streaming): signal-based HTML injection May 15, 2026
Copy link
Copy Markdown
Contributor

@akroshg akroshg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline notes on a few things worth a look. The architecture (structural-signal injection, lock-free ChunkPool, Bytes::from_owner ownership, bounded-channel back-pressure) is the right design and the perf claims hold up — these are 4 edge-case bugs worth fixing on top.

Comment thread crates/webui/src/streaming.rs Outdated
Comment thread crates/webui/src/streaming.rs Outdated
Comment thread crates/webui-handler/src/lib.rs
Comment thread crates/webui-handler/src/lib.rs Outdated
akroshg
akroshg previously approved these changes May 16, 2026
@mohamedmansour
Copy link
Copy Markdown
Contributor Author

Thanks @akroshg — all four findings were valid, fixed in bdd5fee7. Also did an adversarial re-audit on top to hunt for whatever both reviews missed; results below.

✅ Your 4 findings — all fixed

  1. with_flush_timeout silent-disable + abort risk (crates/webui/src/streaming.rs:545-606). Replaced the debug_assert!(false) (no-op in release; never could have helped under panic = "abort") with a runtime-free poll loop (try_send + thread::sleep) that actually enforces the deadline when no tokio runtime is in TLS. A warn-once log::warn! makes the misconfiguration visible to ops. New positive test streaming_writer_flush_timeout_fires_without_runtime verifies Err(StreamTimeout) fires within ±1500ms of the configured 150ms deadline and short-circuits subsequent writes.

  2. end() swallowing the final-flush error. end() now surfaces Err(ClientDisconnected) / Err(StreamTimeout) on first-disconnect (returns Ok if already terminated, since write() already surfaced the error then). New test streaming_writer_end_surfaces_first_flush_error pins the contract. Updated production callers (webui-cli/src/commands/serve.rs, examples/app/commerce/server/src/server.rs) to log::debug! truncated-stream errors so they're visible in telemetry without spamming production logs.

  3. pub field bypass of with_nonce("") normalisation → CSP outage. Added defensive .filter(|s| !s.is_empty()) on all three injection points (nonce, head_inject, body_inject) at handler init in both handle() and render(). Field-update syntax can no longer trigger a hard-CSP <script nonce=""> failure. New test empty_field_bypass_is_normalised_at_handler_init constructs RenderOptions { nonce: Some(""), .. } directly (bypassing the builder) and asserts the empty marker never reaches the output.

  4. Broken doc reference to pub(crate) symbol. Added pub use html_encode::encode_safe; at the crate root with a doc comment explaining the re-export's purpose. Updated the inline doc and the DESIGN.md section to point at the now-reachable webui_handler::encode_safe. Hosts can pre-escape untrusted content with the same escaper the handler uses internally.

Adversarial re-audit findings

Ran a fresh review specifically looking for patterns matching your four (release-vs-debug gaps, swallowed let _ = result;, pub-field bypasses, broken doc paths) plus panic-safety / Send+Sync / nested loop / Playwright concerns.

Bug-6 from the re-audit (nested <for> loops reusing the same variable name) — investigated and proved invalid. The audit claimed the outer's get_mut held a reference past the recursive process_fragment_id, clobbering the inner's restore. Walking through the code: get_mut is called fresh each iteration (line 966) and immediately deref-written, so no reference persists across the recursive call. The inner's remove + pre-insert + restore correctly preserves the outer's value for any fragments emitted between the inner loop and the outer's next iteration. Added regression test nested_for_loops_reusing_same_variable_name_dont_corrupt_scope that exercises [(O=A[I=X][I=Y],O=A)(O=B[I=X][I=Y],O=B)] — proves the outer item is bound correctly before, between, and after the inner loop.

Bug-8 from the re-audit (cumulative addInitScript calls in Playwright bench) — valid 🔵 perf/test bug. The bench calls measure() 64 times (8 iters × 4 scenarios × 2 paths), and the original LCP fix put await page.addInitScript(...) inside measure. Playwright accumulates init scripts per browser context, so by iteration 64, 64 copies of the LCP observer registered on every navigation. Fixed by extracting ensureLcpObserverInstalled with a __lcpObserverInstalled flag on the context, called once via addInitScript on the shared context.

Bug-9 from the re-audit (waitForTimeout(300) for LCP settle is arbitrary) — valid 🔵 test flakiness. The 300ms wait wasn't tied to any LCP-stable signal; on a contended CI runner with a delay_us=500 (~250ms render) scenario, the LCP candidate could keep updating past 300ms and the test would read an incomplete __lcpEntries array. Replaced with waitForLcpStable: polls __lcpEntries.length every 50ms, returns once it's been stable for 200ms, capped at 2000ms so the test cannot hang on adversarial pages.

Net change

  • 9 total findings across both reviews (4 yours + 5 from re-audit), of which 8 valid (4 yours all confirmed; 1 from re-audit confirmed invalid via regression test; 4 from re-audit confirmed valid and fixed).
  • 3 new regression tests added — one per high-severity finding (timeout-without-runtime, end()-surfaces-error, field-bypass-CSP) + the nested-for proof.
  • 284 handler tests + 15 streaming tests pass (was 283 + 13 before this round; added 1 handler test for the field-bypass + 2 streaming tests for the timeout + end() bugs).
  • cargo xtask check ✅ green (1344s).

Force-pushed as commit bdd5fee7 (amends the prior commit 3 in-place; commits 1 and 2 unchanged).

The two re-audit findings I noted as low-impact but worth fixing (Bug-8 / Bug-9 in the bench) confirm that even after an external review, fresh adversarial eyes find more. That's a good signal — please do another pass if you have time. The streaming and handler hot paths are the right places to keep poking at.

@mohamedmansour
Copy link
Copy Markdown
Contributor Author

Fixed in 6e3af609 (force-pushed).

Root cause

Pre-existing race in the test, widened by the streaming SSR pipeline this PR introduces — not fixed-by-PR but exposed-by-PR.

The test sequence:

await page.locator('mp-category-nav').getByRole('link', { name: 'Shirts' }).first().click();
await expect(page).toHaveURL('/search/shirts');                                      // ← passes
await page.locator('mp-filter-list').getByRole('link', { name: 'Price: ...' }).click(); // ← race

toHaveURL only verifies the URL bar, which the client router updates on receipt of the partial-nav response. The filter-list components re-render asynchronously from the same response, so there's a window where:

  • URL = /search/shirts
  • mp-filter-list DOM still has href="/search/stickers?sort=..." (stale)

A click in that window hits the stale link and lands on /search/stickers?sort=price-desc, exactly what CI saw.

Local 10×10× repro on my machine: passes 10/10. Origin/main also passes 10/10 with the same test code. The race window is below the schedulable threshold on a fast Mac M-series; on the GitHub Actions Linux runner with playwright workers contending for CPU, it lands inside the click.

The streaming SSR work in this PR doesn't directly affect the partial-nav code path (partials still go through partial_responserender_partial, returning JSON in one shot — not streaming). But the initial page load is now streamed, so on a constrained runner the framework's hydration / state-init can interleave differently with subsequent partial-nav DOM patches and widen the same pre-existing race.

Fix

Made the test deterministic by waiting for the filter-list href to actually update to the new category before clicking:

await expect(page).toHaveURL('/search/shirts');
// Count-based wait (not visibility) — mp-filter-list emits both
// desktop and mobile-only variants, only one is `display`-ed in
// each project, but both share the updated href once the DOM
// patch lands.
await expect(
  page.locator('mp-filter-list a[href*="/search/shirts?sort=price-desc"]'),
).not.toHaveCount(0);
await page.locator('mp-filter-list').getByRole('link', { name: 'Price: ...' }).click();

Verified locally with the rebuilt branch binary: 10/10 passes.

The remaining 9 e2e failures are the pre-existing macOS↔Linux screenshot baseline mismatches (4 commerce + 5 contact-book toHaveScreenshot tests) — same set noted on origin/main, unaffected by this PR.

@mohamedmansour mohamedmansour requested a review from akroshg May 16, 2026 04:58
@mohamedmansour mohamedmansour changed the title perf(streaming): signal-based HTML injection feat: streaming SSR pipeline May 16, 2026
…path

Builds on the streaming primitive from the previous commit to add the
per-render HTML injection API (`RenderOptions::with_head_inject` /
`with_body_inject`), six allocation-reducing changes on the handler hot
path, five streaming/pool-side improvements, two security guards, and
the wiring for the dev CLI and the commerce example.

Replaces the legacy buffer-then-byte-scan-and-concat injection
pipeline with a structural, signal-driven mechanism. The parser
already synthesises head_end / body_end signal fragments at the
structural boundaries (crates/webui-parser/src/lib.rs:1189-1230),
so the handler simply emits the inject HTML at the existing hook
sites. No byte scanner. No second pass. Per-render injection is a
single writer.write(html) call at the parser-anchored signal:
zero scan cost, and the signal cannot be spoofed by </head> /
</body> literals appearing in HTML comments, <iframe srcdoc>, or
inline <script>.

## Performance vs the previous-commit baseline (commit 2)

(per-render, 2000 iters, contact-book at 1000 contacts, custom
GlobalAlloc + getrusage)

  metric         | previous commit | this commit | delta
  ---------------|-----------------|-------------|--------
  string/1000               allocs | 525             | 514         | -2.1%
  streaming/1000            allocs | 538             | 527         | -2.0%
  string+postinject/1000    allocs | 526             | 515         | -2.1%
  streaming+inject(opts) POOLED bytes | n/a (new path)              | 30.3 KiB
  user CPU (any path)              | ~25-30 us       | ~21-23 us   | -10..-30%

Cumulative wins of the new POOLED path vs origin/main legacy
`string+postinject`:

  metric        | origin/main  | this commit POOLED | delta
  --------------|--------------|--------------------|--------
  allocations   | 526          | 520                | -1.1%
  bytes/render  | 75.0 KiB     | 30.3 KiB           | -59.6%
  user CPU us   | ~29.7        | ~21.1              | -28.9%
  TTFB          | full buffer  | first signal       | streaming

## What changed at the handler layer

- crates/webui-handler/src/lib.rs:
    * RenderOptions gains `head_inject: Option<&'a str>` /
      `body_inject: Option<&'a str>` fields and matching builders
      `with_head_inject` / `with_body_inject`. Empty strings normalise
      to None for consistency with `with_nonce`.
    * `process_signal` emits the inject HTML at the existing
      head_end/body_end hook sites, after the built-in nonce meta /
      CSS preload links / hydration script. Each emission guarded by
      a `head_end_emitted` / `body_end_emitted` flag on
      WebUIProcessContext so a malformed protocol cannot multiply the
      inject by N (DoS amplification guard).
    * Six allocation-reducing changes on the per-render hot path:
        1. request_path: String -> &'a str         (-1 alloc/render)
        2. entry_id:     String -> &'a str         (-1 alloc/render)
        3. nonce:        Option<String> -> Option<&'a str>  (-1 alloc)
        4. route_base:   String -> Cow<'a, str>    (-1 alloc, "/" zero-copy)
        5. <for> loop variable: insert key once, get_mut-swap value
           in-place instead of clone-per-iteration. Saves 2*(N-1)
           String clones for any N-iteration loop. A 1000-item <for>
           saves 1998 allocations.
        6. Lazy component_index_cache on the per-render context.
           build_component_index() was rebuilt twice per render
           (head_end + body_end), each walking the protocol. Now
           built on first demand and reused.

- crates/webui/src/streaming.rs (5 hardening changes):
    1. Inject fields stored as Option<&'a str> everywhere (no per-
       render String::from clone).
    2. Fast-path send_with_optional_timeout when no timeout: skips
       Handle::try_current() (~10 ns TLS lookup) on every flush.
    3. Move chunk-buffer clear from acquire to release. ChunkPool
       now clears Vec on release (cheap, just len = 0); acquire
       trusts the invariant. One fewer branch on every chunk acquire.
    4. with_nonce("") normalises to None, matching the inject API.
    5. debug_assert! in the unreachable timeout-without-runtime
       branch instead of silent fallthrough.

## Two security guards (DoS-class)

1. Dedupe head_end / body_end emission. Without this, a malformed
   protocol that emits the signal N times would multiply the host's
   inject by N: a 1 MiB inject x 1000 duplicate signals would have
   produced 1 GiB of output. Now emits exactly once per render. Test
   `injects_dedupe_against_duplicate_signals` pins the guard.
2. Explicit XSS warning on with_head_inject / with_body_inject doc
   comments. Handler writes HTML verbatim - no escaping. The trust
   contract is now unmissable.

## Production wiring

- crates/webui-cli/src/commands/serve.rs: dev server uses
  StreamingWriter::new_pooled with a startup-built ChunkPool (256
  slots * ~5 KiB = 1.25 MiB peak), 30 s flush deadline (slow-loris
  DoS bound), and feeds the livereload script as Arc<str> via
  RenderOptions::with_body_inject.
- examples/app/commerce/server: same pattern; per-page image preload
  link tags via RenderOptions::with_head_inject.

## Bench rows added

The criterion writer-paths bench and the resource bench gain
`streaming+inject(opts)` and `streaming+inject(opts) POOLED` rows
that exercise the new API. The previous commits' baseline rows
remain so deltas are directly comparable across all three commits.

## Test coverage (12 new tests in this commit)

Handler:
  - head_inject_emits_at_head_end_boundary
  - body_inject_emits_at_body_end_boundary
  - injects_are_no_op_when_unset
  - empty_inject_string_treated_as_unset
  - inject_html_is_passed_through_verbatim
  - injects_robust_against_marker_literals_in_content
    (proves the structural-signal approach cannot mis-fire on </body>
    literals inside HTML comments - a class of bug the byte-scanner
    approach was vulnerable to)
  - both_injects_fire_at_correct_boundaries
  - injects_dedupe_against_duplicate_signals (security guard)
  - injects_no_op_when_no_head_or_body_signals (Shadow DOM safe)
  - concurrent_renders_with_different_injects_do_not_cross_contaminate
    (16-thread stress test of the &self handler)
  - large_inject_roundtrips_without_truncation (1 MiB inject)
  - empty_nonce_treated_as_unset (API consistency)

All 283 handler tests + 13 streaming tests pass.

## Documentation

DESIGN.md "Streaming Response Writers" section rewritten to document:
- the signal-based injection API
- the safety contract (raw HTML, no escaping, host owns trust)
- the dedup guarantee (max one emission per render)
- the zero-allocation borrow invariant
- the structural-signal correctness advantage over byte-scanning

User-facing docs and bench READMEs reference the new API.

Reproduction:

  # Capture baseline at the previous commit:
  git checkout HEAD^
  cargo xtask bench full --save-baseline before

  # Apply this commit and compare:
  git checkout HEAD
  cargo xtask bench full --baseline before

Quality: cargo xtask check passes (1111s, all phases).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mohamedmansour mohamedmansour requested a review from a team May 16, 2026 05:27
@mohamedmansour mohamedmansour merged commit 2a0f3ff into main May 16, 2026
21 checks passed
@mohamedmansour mohamedmansour deleted the feat/bench-infra branch May 16, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants